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Biological systems, from cells to organisms, must respond to the ever changing environment in 
order to survive and function. This is not a simple task given the often random nature of the signals 
they receive, as well as the intrinsically stochastic, many body and often self-organized nature of 
the processes that control their sensing and response and limited resources. Despite a wide range of 
scales and functions that can be observed in the living world, some common principles that govern 
the behavior of biological systems emerge. Here I review two examples of very different biological 
problems: information transmission in gene regulatory networks and diversity of adaptive immune 
receptor repertoires that protect us from pathogens. I discuss the trade-offs that physical laws 
impose on these systems and show that the optimal designs of both immune repertoires and gene 
regulatory networks display similar discrete tiling structures. These solutions rely on locally non¬ 
overlapping placements of the responding elements (genes and receptors) that, overall, cover space 
nearly uniformly. 


I. INTRODUCTION 

A fascinating aspect of biological systems is the emer¬ 
gence of large-scale reproducible function from the small- 
scale molecular interactions between cellular elements 
(proteins, genes). Living systems, both whole organisms 
and molecular units, amaze us by the precision of their 
performance. How is this precision achieved under the 
physical constraints that biology must obey? One way 
to approach this question is to note that many biological 
systems display emergent behavior: macroscopic, stereo¬ 
typed phenomena that cannot be explained merely by 
composing the properties of the system’s underlying ele¬ 
mentary, and intrinsically noisy, units. As we know from 
physics, nontrivial emergent behavior often results from 
interactions on different length, time, or energy scales. 
These effects are also ubiquitous in biological systems, for 
example in single cells expressing certain subsets of genes, 
in highly orchestrated multi-cellular programs such as de¬ 
velopment or the reliable response of the adaptive im¬ 
mune system against attacking pathogens. Examples of 
correlated phenomena have been extensively studied in 
statistical physics for the past century, leading to increas¬ 
ing our understanding of many-body interactions in con¬ 
densed matter systems, as well as technological advances. 

Concurrently, recent advances in experimental tech¬ 
nologies give us great insight into the functioning of bio¬ 
logical systems both at the molecular, inner-cellular level, 
as well as the level of large scale functional systems in the 
organism and the behavior of large scale groups of ani¬ 
mals. These technical developments allow us to make 
quantitative measurements of their constitutive elements 
and link it to their function. Trying to understand the 
functioning of these various systems, we see the emer¬ 
gence of common principles governing their behavior, de¬ 
spite the large biological differences, their functioning at 
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different scales, and their fulfilling very different func¬ 
tions. In recent years physicists have become more inter¬ 
ested in how physical principles are realized in cells. In 
the last decade, such an approach of taking inspiration 
from different biological systems (such as vertebrate de¬ 
velopment, chemotaxis, fly development, olfaction, visual 
processing) has proven very fruitful in proposing poten¬ 
tial design principles (e. g. error correction EE] , noise 
minimization El El, information transmission [M3] , ac¬ 
quiring information |12| . speed and accuracy of decision 
making El. minimax strategies ESI. evolvability HMD, 
optimization of resources |19H21| ) that govern how phys¬ 
ical laws are realized in living organisms. The lessons 
learned from these theoretical ideas have pushed the lim¬ 
its of experiments in concrete systems and often ques¬ 
tioned our understanding of basic physical and biological 
processes. 

Biological systems perform a function, limited both by 
the physical laws they must obey, as well as limited re¬ 
sources in the environment they find themselves in. Func¬ 
tioning efficiently and reliably in a given environment re¬ 
quires the matching of the statistical properties of the 
system to those of the environment, as has been dis¬ 
cussed in the context of neuroscience [221 E3] ■ If infinite 
sensing elements were available, the environment could 
be sensed up to the limits imposed by intrinsic physical 
noise. Of course this is not the reality of any biological 
system, where sensing and response must be fast and reli¬ 
able and natural trade-offs appear in the design of these 
systems. If we assume that the structure of biological 
systems makes it possible for them to reliably interact 
with their environment, we can attempt to understand 
which elements of form are linked to certain functions. 

Here I will discuss two very different systems that per¬ 
form two very different functions: genes and their regu¬ 
latory proteins, inspired by regulation in developmental 
systems and the ensemble (called repertoire) of receptors 
expressed on the surface of immune cells. Generally, the 
goal when sensing is to cover the whole input space in 
such a way that each part of this space is well covered, 
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given the constraints of limited resources. The detailed 
description and formulation of the goals of these two sys¬ 
tems is very different, but similar trade-offs appear in 
these two different contexts. As we shall see, the optimal 
solutions that these two systems find are very similar, 
although they are solutions to very different problems 
that involve an adequate, yet different in their nature, 
response to their environments. In short, they both in¬ 
volve tiling the input space, be it the input concentration 
of a developmental gradient or the current distribution of 
antigens (elements of pathogens), with their sensory ele¬ 
ments. I will concentrate on these two examples coming 
from my own work. However the idea of tiling by sensory 
systems has been wildly explored in neuroscience (where 
it is termed ’’lateral inhibition”) and comes about natu¬ 
rally in information theory. I will mention briefly these 
two cases in the discussion. I will start by explaining the 
problem of interest in each of the systems and show how 
tiling solutions emerge in both cases before discussing the 
differences and similarities between them. 

The work presented here is a review of work I did with 
different collaborators, all of whom have been exploring 
how sensory systems function. The gene regulatory sys¬ 
tem inspired by fly development was done in collabo¬ 
ration with Gasper Tkacik and William Bialek m m- 
126] . I considered the question of optimal immune reper¬ 
toires with Andreas Mayer, Vijay Balasubramanian and 
Thierry Mora [27] , In this review I chose to present only 
one aspect of the results obtained for these two systems - 
one that is common to both - tiling. The analysis in the 
original papers has many different perspectives that I do 
not discuss here. 


II. GENE REGULATION 

An important question in developmental biology is how 
the symmetry at the early stages of the embryo is stably 
broken to create structured organisms given the inherent 
cellular stochasticity. Examples of symmetry breaking 
are observed in development where the mother lays the 
foundation for gradients that are later translated to cell 
fates through a noisy signaling network. During devel¬ 
opment cells differentiate and start expressing different 
sets of proteins. The fruit fly (Drosophila melanogaster) 
is a model organism to study early embryonic develop¬ 
ment and cell differentiation [281 2§] . The fly mother 
produces bicoid mRNAs, which are laid in the anterior 
of the egg. As the proteins translated from these mR¬ 
NAs diffuse away from the pole they establish a decay¬ 
ing anterior-posterior protein gradient. Together with 
other maternal proteins (e.g. Nanos), Bicoid proteins lo¬ 
cally regulate the expression a set of downstream ’’gap 
genes” (hunchback, kruppel, knirps and giant) to deter¬ 
mine, among other features, the anterior-posterior (head 
to abdomen) axis. The gap genes control the expression 
of downstream genes (called pair-rule genes) that form 
very well defined stripes, which later lead to the forma¬ 


tion of segments in the fly’s body. The precise positioning 
and width of these stripes is essential for correct develop¬ 
ment. One of the puzzles of biology is how the position 
of the stripes can be controlled so accurately. All the 
positional information the fly embryo has is contained in 
the profiles of the maternal proteins. This information 
must be transmitted accurately in the different steps of 
gene expression, or the developmental plan will fail. 

The task of transmitting the information about the 
concentrations of the maternal gradients is made more 
difficult by the fact that gene expression is a noisy pro¬ 
cess. On the molecular level, the interactions between 
genes and proteins occur by means of chemical reac¬ 
tions, which are probabilistic in nature. Furthermore the 
scarcity of the reaction products increases the intrinsic 
noise of the cell, requiring a stochastic framework 0C2B- 
G0. There is typically one active copy of DNA per cell, 
a few copies of mRNA and tens to hundreds copies of a 
protein of a given species. The stochastic nature of gene 
expression has been confirmed experimentally mum na¬ 
il)]. 

Many elements of the gene regulatory network that 
regulate the expression of stripes have been mapped out. 
Owing to the experimental and theoretical advances of 
the last decade, we now have a good understanding of 
the molecular details of the basic forms of gene regula¬ 
tion. At the same time, our understanding of the basic 
components of molecular noise has increased. We can 
use this knowledge to go beyond the simple characteriza¬ 
tion of gene regulatory networks and ask whether we can 
identify the physical principles that govern the observed 
behavior of circuits. Specifically, can we understand why 
the early steps of cell differentiation in the fly embryo fol¬ 
low this specific pattern? How do the specific regulatory 
elements come together in space and time, and which 
parts of the regulatory process control which observed 
features? 

In this initial stage of development, the continuous Bi¬ 
coid concentration gradient gets translated into localized 
expression patterns of the gap genes: hunchback is only 
expressed in the first (anterior) part of the embryo, krup¬ 
pel in the middle, giant and knirps in two sets of dis¬ 
tinctly positioned stripes along the length of the embryo 
m- Inspired by the regulation of the gap genes in early 
fly developed, Tkacik, Bialek and myself were interested 
in understanding the circumstances under which the ex¬ 
pression of the target genes becomes localized 0|21|26]. 
Even these early stages of fly development include a num¬ 
ber of complicated interactions between the target genes, 
so we started by studying a simplified system where one 
continuous input (inspired by Bicoid) regulates L inde¬ 
pendent downstream genes, inspired by the L = 4 gap 
genes (see Fig. [I]). We asked what are the regulatory 
interactions between the one input and L output genes 
that maximizes the transmitted information between the 
input and outputs. 

Information between the input and output is an intu¬ 
itive concept that is also formally defined as mutual in- 
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FIG. 1: Gene regulatory networks respond to input signals 
by producing output proteins. Regulatory functions that op¬ 
timize the information between the input and output require 
matching the statistics of the input distribution P(c) with the 
properties of the network P({g}|c). To do that we need to 
specify the nature of the regulation, which we assume is well 
described by the its mean regulatory function { g } and Gaus¬ 
sian noise R({<?}|c) ~ d\f({g}, £ c ). The molecular biophysical 
properties of network are summarized in the form of the in¬ 
put/output function and the noise. For clarity of illustration 
this is portrayed on the example of one gene, but the picture 
generalizes to L genes. Optimizing information with respect 
to the input distribution and the properties of the network 
results in the optimal regulatory functions. The optimal func¬ 
tions were obtained assuming Hill regulatory functions, and 
are shown in Fig. [2] 


formation in terms of the difference of entropies between 
the output distribution, 5 '[P({( 7 }), and the conditional 
distribution of the output given the input, /■/[Pdgjlc)] 

ESI gang: 


the interested reader can refer to [55] for more details. 

To make progress, we also need to characterize the reg¬ 
ulatory network. We can assume that the conditional 
distribution of the output given the input that describes 
the regulation function is well approximated as a Gaus¬ 
sian with covariance E(c) around the deterministic regu¬ 
latory function. In general for L interacting output genes 
£(c) is the inverse covariance matrix of size L x L of the 
fluctuations in the expression levels {g} at fixed input 
c. However in the specific case discussed in most of this 
review of L non-interacting genes, E(c) only has diago¬ 
nal entries coming from the variance in its own output 
and the global input c, Ejj(c) = of(c)£ij. The Gaus¬ 
sian assumption effectively reduces the problem of de¬ 
scribing the input and output probability distributions, 
P({g}) = f dcP({g}\c)P(c) to knowing the input distri¬ 
bution and the parameters of the L dimensional Gaussian 
distribution, i.e. its mean { 5 (c)} and covariance matrix 
E(c). Since this is rather repetitive to draw for L nonin¬ 
teracting genes, Fig. [l] depicts the parametrization of the 
network for one output gene. 

We can now use our knowledge of the regulatory in¬ 
teractions and noise properties in gene regulation and 
account for all the biophysical constraints by parametriz¬ 
ing the variance and the mean regulatory functions. We 
chose a basic thermodynamic Hill model for regulation 
(see below) and assumed that most of the noise comes 
from the random production of proteins (Poisson noise - 
first term in Eq. [ 2 ]) and the diffusion limited switching 
of the genes (a Berg-Purcell term [44] - second term in 
Eq# 


°?(c) = 


N„ 


gi{c) + cc 0 


dgj(c) 

dc 


( 2 ) 


where iV max is the maximum number of independent 
molecules that are made from gene i and cq = N max /Dar 
is the characteristic concentration scale composed of the 
characteristic length and timescales for regulation (the 
diffusion constant D, the size of the target binding site 
a and the input integration time r). By doing this we 
effectively parametrize the variance by the mean input- 
output relation, gi(c). The Hill functions that describe 
t/i(c) are sigmoidal smooth monotonic functions of the 
input concentration 


J(c, {«?}) = J dcP(c) (S[P{{g})] S[P(M|c)]) ■ (1) 

Entropy measures the uncertainty of a given distribution, 
S[P(x)\ = — f dxP(x) logP(x), where x is equal to {g} 
and { 5 } conditioned on c in Eq. [lj Knowing the entropy 
of the output distribution gives us a measure of our un¬ 
certainty of the output. However knowing the value of 
the output given a particular input further constraints 
our uncertainty. And precisely that reduction of our un¬ 
certainty about the output is the information we have 
gained by measuring the input. The subject of informa¬ 
tion in gene regulation has previously been reviewed and 


parametrized by the concentration which results in half- 
maximal expression of the gene, K ll called the dissocia¬ 
tion constant, and the steepness of the regulatory func¬ 
tion /ij, which is linked to the cooperativity of the molec¬ 
ular reactions involved in regulation. The form of this ex¬ 
pression can be derived from thermodynamic arguments 
with A'j = exp (—F/ksT) where F is the the free energy 
of binding per input molecule. The sign of the coop¬ 
eratively parameter hi differentiates between activation 
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(hi > 0) and repression (hi < 0) of the target gene by 
the input, and its value extrapolates between linear reg¬ 
ulation ((h[ = 1) and threshold switching (hi —» oo). 
Each gene responds by producing a differentiable output 
in a limited input concentration range ~ Ki/h-, set by 
the cooperativity coefficient and measured in units of K, 
around the concentration midpoint given by the dissocia¬ 
tion constant K\. Below that range the gene is essentially 
off, and above it has saturated its expression. More de¬ 
tails of the parametrization can be found in 0122 Eg. 
In summary, this parametrization allows us to describe 
the properties of the network in terms of two parame¬ 
ters for each gene: the dissociation constant, Ki, which 
describes the positioning of the gene in the input concen¬ 
tration range and the cooperatively function which sets 
the range of inputs the gene is responsive to. To find the 
optimal network we must find the optimal values of these 
parameters. 

Biological systems must obey a number of constraints, 
including paying a cost for producing molecules. We can 
now optimize mutual information with respect to the in¬ 
put distribution given this molecular constraint that we 
include by limiting the range of input molecules (tech¬ 
nically imposing an upper integration bound for input 
concentrations) and demanding the distribution is nor¬ 
malized: 


1 

6P(c) 


S[P({ 5 })] 


dcP(c)S(P({g}\c)} + (4) 
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dc P(c ) 


= 0 . 


(5) 


The expression is easily minimized, because we have 
parametrized P({g}\c) as a Gaussian defined explicitly 
in terms of biophysical rates of the problem. The opti¬ 
mal input distribution P* (c) is expressed in terms of the 
uncertainty of the input c given the outputs {(/}, given 
by the variance of the posterior cr^({g}): 
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with the normalization given by 
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The last line in Eq. [7] explicitly gives the expression for L 
non-interacting genes, whereas the other expression keep 
the general form for possibly interacting genes. Given 
the optimal input distributions we can calculate the the 
optimal information of the system 


I* (c; {<*})= log Z. (11) 

The details of this calculation can be found in 0. 
The optimization problem requires finding the optimal 
parameters of the regulatory functions ({Ki, /ii}i=i,..,L) 
that maximize log Z, whose form is determined by the 
noise. The optimal solutions are a result of the balance 
between the two sources of noise in Eq. [2] the input 
noise coming from fluctuations in regulatory protein con¬ 
centrations (second term in Eq. [2]), and the output noise 
caused by the small number of produced proteins g ; (first 
term in Eq. [2j . At small input concentrations the fluc¬ 
tuations from an unreliable readout of c dominate and 
push the solutions to have higher values of Ki, whereas 
the need to distinguish different levels of outputs reli¬ 
ably decrease the steepness of the regulatory functions 
and forces them to use also the smaller concentration 
ranges, decreasing K\ and hi. The actual parameters of 
the regulation function need to be optimized numerically 
and for large concentration ranges of input molecules we 
obtained the characteristic optimal regulatory functions 
as shown in panels B-E of Fig. [2] The different pan¬ 
els show the solutions for increasing input concentration 
ranges. In these solutions each gene is effectively regu¬ 
lated in a finite localized input concentration regime. For 
small concentrations the first gene is expressed and when 
it saturates, the second gene is expressed. This trend 
continues and the gene expression domains tile space. 

The tiling solution is the most informative solution 
when the concentration range of input molecules is large. 
When the concentration range of the input molecules is 
small (see Fig. [2] A), the optimal solution consists of all 
genes making the same readout of the input by having 
exactly the same regulatory function. In this case the 
genes no longer tile space, but repeat the same mea¬ 
surement to minimize the error coming from reading out 
small concentrations. The transition from one regime to 
another is continuous as a function of input range concen¬ 
tration and information. This can be explained in terms 
of the dissociation constants of each gene, which give the 
concentration value at which a gene is expressed at half 
maximum. Either there is enough concentration range 
to use the discrete gene readout —in this case the genes 
have different values of the dissociation constants; or it 
is better to attempt a reliable readout in one concentra¬ 
tion regime, and the dissociation constants collapse. The 
transition from the non-tiling to tiling regimes in terms 
of the dissociation constants is shown in Fig. [2] F. Fig¬ 
ure [2] also shows that the tiling of genes is gradual and 
the redundancy is lifted one gene at a time as the input 
concentration range increases. 
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FIG. 2: The most informative regulatory functions 

{(/i(c),( 75 (c)} (shown in different colors) for a gene reg¬ 
ulatory network with one input c regulating L = 5 non¬ 
interacting output genes, for increasing ranges of the max¬ 
imum input concentrations c max (A-E). At very small values 
of c m ax all L genes have the same regulatory function mak¬ 
ing the readout completely redundant. For increasing values 
of c m ax the redundancy of the readout is lifted, as succes¬ 
sive genes individually cover particular subranges of the in¬ 
put concentration. (F) The emergence of the tiling solution 
in terms of the gene dissociation constants K rm 1 ,..., K r m 5 as 
a function of c max . We assumed Hill regulatory functions and 
Berg-Purcell and small molecule noise dominate protein fluc¬ 
tuations, as described in [7j. 


III. IMMUNE RECEPTORS 


Let me now leave gene regulation and turn to a com¬ 
pletely different system - the adaptive immune repertoire. 
After this short presentation, I will return to the our in¬ 
formation optimal gene regulatory network and compare 
the characteristics of these two very different biological 
systems. 

The role of the immune system is to protect the or¬ 
ganism from the many pathogenic threats it constantly 
encounters. To fulfill this role, it must be prepared to 
identify a great variety of unknown challenges, including 
ones it has never been exposed to. It must thus maintain 
a diversity of specialized cells, each specific to particu¬ 




FIG. 3: The simplified recognition problem in the adaptive 
immune system: receptors from the repertoire distribution P r 
recognize antigens from the environment distribution Q a with 
a cross-reactive recognition probability / r , a - 


lar challenges, but which together cover the full array of 
potential threats. These cells are called B and T-cells 
and the particular receptors responsible for recognition 
on each of these specialized cells are generated in an es¬ 
sentially random manner j45|. Yet together these recep¬ 
tors form a diverse repertoire that allows the immune 
system to fulfill its function of recognizing pathogens ex¬ 
ceptionally well. Since not all threats are equally likely, 
the immune repertoire adapts to the changing pathogenic 
environment, at the same time keeping a memory of past 
infections. The diversity of the composition of the im¬ 
mune repertoire emerges as a self- organized process, 
stimulated by interactions with the environment. 

Receptor proteins on the surfaces of these cells interact 
with pathogens, recognize them through specific binding 
and initiate the immune response. The interaction be¬ 
tween pathogen proteins and receptors is based on the 
binding of two polypeptides (one being part of the re¬ 
ceptor, the other being part of the pathogenic protein) 
and is specific, yet degenerate: a single receptor is able 
to recognize more than one pathogen peptide (antigen) 
and, conversely one antigen can bind to more than one re¬ 
ceptor. How the highly dimensional space of pathogenic 
peptides is covered by receptors is a particularly chal¬ 
lenging example of a covering problem. 

The immune response is controlled by many factors 
on many scales, however it is initiated when a B or T- 
cell receptor successful recognizes an element of a foreign 
pathogen, called an antigen. In our approach, presented 
in m, we decided to focus on this part of the puzzle 
and ask how should immune receptors be distributed in 
order to minimize the harm from infections given a fixed, 
static antigenic environment. Additionally, antigens and 
receptors have a limited number of encounters, since it 
takes time for a receptor and antigen to meet given the 
finite concentrations of both and the size of the organism. 
This imposes a constraint on the efficiency of recognition. 
By formulating the problem in this way, we simplified it 
to a static version of a covering problem with limited 
resources. However even in this simple formulation the 
exact meaning of most of the used terms needs to made 
precise. 

Since recognition is triggered by binding of receptors 
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and antigens, we can consider the problem in an effec¬ 
tive recognition space. Both types of molecules (antigens 
and receptors) live in this space and they recognize each 
other if the distance between them in this space is small. 
The idea of recognition space is similar to shape space 
[46] which has been very useful for decades for describing 
effective antigen-receptor interactions. We do not need 
to further parametrize the space to describe this interac¬ 
tion (for example as has been done using string models, 
where both molecules are taken to be strings of effec¬ 
tive amino acids with effective physical and biochemical 
properties and their similarity is measured in terms of 
a Hamming distance between the strings), however this 
specific picture is a helpful concrete example of the type 
of effective recognition space we have in mind. Since we 
are thinking about a static picture, the receptor reper¬ 
toire is described by a probability distribution P r and the 
fixed ensemble of antigens by Q a , as depicted in Fig. [3] A 
receptor and antigen can meet and recognize each other 
with a cross-reactive recognition probability / a , p . This 
function accounts for the fact that one receptor can rec¬ 
ognize many pathogens and, conversely, one antigen can 
be recognized by many receptors. Given the recognition 
probability, the probability of an immune response from 
an encounter of a random receptor with a given antigen, 
a, is -Pa = Er /r.a-Pr- 

Since we are interested only in the consequences (recog¬ 
nition or not-recognition) of encounter events, we choose 
to measure time in the mean number of encounters m. 
The recognition events are random and Poisson dis¬ 
tributed in m. The limitation on the number of encoun¬ 
ters can also be understood in terms of finite sampling of 
the receptors by the antigen. As the number of encoun¬ 
ters increases while the antigen remains unrecognized, 
the effective cost of the infection increases according to a 
function P a (m), due to the damage caused by the poten¬ 
tially proliferating antigen to the tissues of the organism. 
Therefore to obtain the total harm caused by a given 
antigen we need to integrate the effective cost over the 
number of encounters weighted by the distribution of suc¬ 
cessful recognition encounters: 

r+oo 

F*{Pt)= dmF a (m) P a e~ mPa . (12) 

Jo 

Finally, the overall cost to the organism needs to take 
into account the costs from all the antigens: 

Cost({P r }) = ]T Qa.F a (P r ). (13) 

a 

This cost accounts for the trade-off between a having to 
distribute many receptors given a finite number of en¬ 
counters in such a way that the total harm caused by 
infections increases with time. Given a fixed antigen dis¬ 
tribution, Q a we find the optimal distribution P r of re¬ 
ceptors that minimizes this cost. 

The details of this optimization, as well as some an¬ 
alytical intuition gained from limiting cases is discussed 


in detail in m- The interesting results for the purpose 
of the current discussion are best seen in the numerical 
results recounted in Fig. [4] A for a two dimensional ran¬ 
dom antigen distribution. We see the optimal receptor 
distribution tiles space in a random way: the distribu¬ 
tion is discrete with receptors positioned as individual 
non-overlapping peaks in recognition space. To quantify 
this pattern in more detail we can look at the radial dis¬ 
tribution function as a function of distance between the 
receptor positions (Fig. [4] C). We see a strongly repelling 
core at small distances, that forbids the placing recep¬ 
tors too close to one another and characteristic regular 
peaks at larger distances that indicate likely positions 
of the receptors. Analyzing the structure function, con¬ 
firms our intuition that the precise placement of the re¬ 
ceptor is not important (Fig. i°)- The structure factor 
at long wavelengths corresponding to short inter-receptor 
distances goes to 1, as it does in a liquid or disordered 
glass. However at large scales the pattern is completely 
reproducible, as quantified by the structure factor going 
to zero at short wavelengths. Cross-reactivity allows one 
receptor to cover all the antigens within a given range 
and results in the hard core, whereas needing to protect 
against even the rare but potentially dangerous antigens 
requires a thorough coverage of the whole space. 

A biological implication of this type of receptor distri¬ 
bution is the fact that two individuals seeing roughly the 
same antigen environment can have dramatically differ¬ 
ent optimal repertoires aimed at targeting this specific 
set of antigens, while both manage to attain complete 
covering of the antigens, as illustrated by Fig. [4] B. Two 
individuals can see two slightly different versions of the 
same environment simply due to sampling different anti¬ 
gens. Further discussion of the immunological implica¬ 
tions can be found in the original paper |27| . 

The appearance of the tiling pattern in the optimal 
solution depends on the value of cross reactivity. This is 
most easily seen in the limiting case of taking both the 
antigenic environments and cross-reactivity to be Gaus¬ 
sian functions m - the localized peaked receptor distri¬ 
bution is optimal only when the variance of the cross¬ 
reactivity a is more than \/2 times larger than the vari¬ 
ance of the pathogen distribution er > s/^uq. The tran¬ 
sition is continuous: the variance of the receptor distri¬ 
bution decreases until it becomes a delta function. 


IV. CONNECTION TO REAL SYSTEMS 

The reasoning presented above finds the form of ide¬ 
alized optimal regulatory networks and repertoires. The 
role of such an approach is not necessarily to explain the 
detailed form of a given biological design. In fact in both 
cases, as is especially clear in the case of the immune 
repertoire, we have greatly simplified the problem and 
not taken into account important signaling and spatial 
dependencies of the system. Our goal in both case was 
to learn some general properties and understand which 
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FIG. 4: Optimal receptor distribution P r * for two-dimensional 
random environments (A). The antigenic landscape Q a is 
generated randomly from a log-normal distribution with co¬ 
efficient of variation k — 1. (B) A cartoon representation 

of the main characteristics of the optimal receptor distribu¬ 
tion: locally the receptors are placed randomly and are non¬ 
overlapping, whereas at large distances they uniformly tile 
space in a way that two locally distinct patterns are indistin¬ 
guishable on large scales. The lines guide the eye to compare 
the two patterns. These properties are well quantified by an¬ 
alyzing the tiling patterns using correlation functions (C-D). 
(C) The radial distribution function of P*, g(R), has an ex¬ 
clusion zone at small distances around each peak, with a pe¬ 
riodic pattern at large distances characteristic of a local tiling 
pattern. (D) The normalized power spectral density S(q) of 
P* for different values of the coefficient of variation of the 
distribution function of the environmental, k, quantifies the 
heterogeneity of the antigenic landscape at different scales. 
Fluctuations average out at large scales (small q) to uniformly 
cover space, whereas locally (large q) there are many possible 
placements for the receptors (non-zero S q ). 


ingredients lead to what type of characteristics. However 
despite this great simplification, we do reproduce cer¬ 
tain broad features of both gene regulation and immune 
repertoires. In the first case, the separation of the input 
space into domains of specific genes resembles the set-up 
of gap genes in early fruit development. As we discussed 
in detail in [Tj adding interactions between the output 
genes, such as exist between certain gap genes, allows us 
to reproduce similar domains of expression bounded from 
low and high concentration values by domains where the 
gene does not produce proteins. 

Direct comparison to experiments is harder in the case 
of immune repertoires since our predictions are in the ef¬ 
fective recognition space that has not yet been mapped 
out experimentally, whereas experiments give us receptor 
sequences. However, as noted above, this approach re¬ 
sults in certain concrete predictions. Specifically, we ex¬ 
pect the repertoire of two individuals to be different even 
if they are exposed to the same environment. Controlling 
the environment for humans is impossible, but detailed 



FIG. 5: The tiling of space by sensory elements emerges as 
the optimal design in two biologically different settings: gene 
regulatory networks and immune repertoires. The precise po¬ 
sition of the tiling elements varies and is not uniform. It is 
dictated by the matching between the properties of the envi¬ 
ronment and the response elements. Yet the whole space is 
fully covered (middle cartoon). The solution that maximizes 
information transmission between the input and outputs of 
gene regulatory networks tiles the concentration range (top) 
and the cost to the organism is minimized when the repertoire 
tiles the recognition space between receptors and antigens 
(bottom). The details of the gene regulatory optimization are 
discussed in the caption to Fig. [2] The optimal receptor dis¬ 
tribution P* (green line) in one dimension is shown for a ran¬ 
dom environment with an antigenic landscape Q a (blue line) 
generated randomly from a log-normal distribution with co¬ 
efficient of variation k — 1. The optimal repertoire is peaked, 
however the coverage of antigenic space Pa = XX fr,aP* is 
close to uniform. 


studies of shared sequences between individuals show no 
more overlap than expected by chance j47H50j . Zebrafish 
B-cell repertoire studies also showed no correlation be¬ 
tween the environment and repertoire [sum. Another 
corollary of our predictions is that receptors should form 
well separated clusters. Analysis of the amino acid se¬ 
quences of CDR3 regions in zebrafish B-cell repertoires 
showed that the sequences cluster into a relatively small 
number of attractors that are mostly different from the 
genetic templates m ■ This hints that a tiling solution in 
effective recognition space is not incompatible with the 
data. 


V. DISCUSSION 

In the case of the two systems considered here, the ini¬ 
tial formulation of the problem is very different. In both 
cases we looked for optimal solutions, however they are 
optimized for very different quantities. The gene regu¬ 
latory network is optimized for information transmission 
between the input and output - a common assumption in 

























many sensing systems B3 OH [S3 mi- By contrast, the 
immune repertoire is optimized to guarantee the least 
costly response - to allow the immune system to respond 
to a diversity of antigens with the smallest delay weighted 
by the potential harm of not responding. Nevertheless in 
both problems there is a trade-off given by the limita¬ 
tions of the system. In gene regulation the cell does not 
have infinite input molecules at its disposal, forcing it 
to distribute its response genes. In the immune system 
the number of encounters between antigens and recep¬ 
tors is finite, limiting the number of potential recognition 
events. These two trade-offs leads to similar types of so¬ 
lutions that show the same kind of characteristic features 
on short and large scales. 

Naturally the biological nature of the gene regulatory 
and repertoire problem is very different. Additionally 
even in terms of their formal theoretical description the 
systems are very different. The gene regulatory prob¬ 
lem includes placing discrete genes in a continuous space, 
making the problem discrete from the beginning. In the 
case of the repertoire the space is continuous and the 
discrete distribution emerges by itself. 

Despite these differences the gene regulatory and im¬ 
mune repertoire show very similar tiling structures at dif¬ 
ferent scales (see Fig. [5] for a direct comparison). Locally 
we observe clustering in both cases. The optimal reper¬ 
toire has the structure of discrete peaks although the 
probability of having the receptors at a given position 
in recognition space is a priori continuous. The recep¬ 
tors are discrete non-overlapping entities. Similarly the 
range of input concentrations in which a given gene is 
expressed, and in which we can discriminate the input 
concentration by measuring the output, is locally lim¬ 
ited. Reducing that input concentration range results in 
the clustering of the regulatory properties of the output 
genes, and the overall range of inputs they respond to. In 
both systems we see this clustering of function on small 
scales: some areas of recognition space or input concen¬ 
tration are directly covered by a receptor or a given gene, 
whereas nearby areas are not. Yet on the large scale in 
both cases the whole space is nearly uniformly covered 
by genes or receptors, as seen explicitly from the close to 
uniform coverage of the antigens by the receptor distri¬ 
bution, P a = ]T] r / r a P r [27]. So at large scales we observe 
the characteristic tiling of all of space by the total dis¬ 
tribution of receptors, which leaves no part of effective 
space uncovered. 

The analogy depicted in Fig. [5] is made more explicit in 
terms of the parameters of the two problems. The spe¬ 
cific placement of the receptor corresponds to the con¬ 
centration at half maximum expression of the gene () 
- the placement of the gene in input concentration space. 
But just as cross-reactivity ensures that areas of recogni¬ 
tion space where there are no receptors remain covered, 
~ Ki/hj sets the range of input for which a given gene 
differentially responds to input concentrations. 

This structure of local clustering and global tiling of 
space depends on the ratio of cross-reactivity to the 


size of the space for immune receptors and the value 
of the maximum input concentration scaled in natural 
units of concentration for gene regulation. Natural units 
of concentration result from the characteristic physical 
timescales of regulation (diffusion constant D, signal pro¬ 
tein integration time r, the maximum number of inde¬ 
pendent molecules of the output iV max and typical size of 
binding site a), cq = N max / Dclt. If we consider a rela¬ 
tively large effective space (whether it is the input con¬ 
centration range or recognition space) the optimal solu¬ 
tion will consist of L > 1 genes and L > 1 receptors that 
use cross-reactivity to cover the whole pathogenic space. 
If we now decrease this effective space, the optimal ge¬ 
netic solutions will reduce to one gene and similarly the 
whole recognition space can easily be covered by one re¬ 
ceptor. The discrete structure of the optimal immune 
repertoire distribution depends on the small scale noise. 
If there was no intrinsic variability at small scales the 
optimal distributions would be continuous since nothing 
would differentiate between particular points in space. 
This effect can be seen in the limiting case of Gaussian 
antigen distributions where there is no small-scale noise 
at all and the optimal receptor distribution is also Gaus¬ 
sian (given a Gaussian cross-reactivity function). 

The fact that solutions that optimize information in 
a finite space result in discrete solutions that tile space 
has been known for a long time [55] 156] and studied in 
a number of systems from neuroscience m \ to ecology 
158] . The only way to obtain a continuous optimal dis¬ 
tribution when optimizing information is to consider an 
unbounded space. However the tiling solution discussed 
here in terms of gene regulation is slightly different. The 
genes are already discrete, imposing the discrete struc¬ 
ture on the problem. The input concentration range adds 
an additional layer of potential discreteness (clustering) 
that is different from the discreteness of the information 
optimal distributions discussed in other contexts. The 
discrete number of genes is separated in concentration 
space, which they tile, building a discrete structure from 
already discrete units. This type of solution is analogous 
to optimal tiling of the visual receptive range by retinal 
ganglion cells [59] [60] and linked to the hypothesis of effi¬ 
cient coding [22l ■ The pattern of repressing neurons that 
process visual stimuli in the retina (’’lateral inhibition”) 
has been proposed as a way to remove redundancy in 
the encoding of the signal that comes from correlations 
in the environment (visual stimulus) and the response 
(receptive field). 

More generally, the appearance of discretized solu¬ 
tions in continuous systems, similar to the tiling pat¬ 
terns described here has been described in many differ¬ 
ent contexts. Many of these analogies, pointed out to me 
by Oliviere Rivoire and Thierry Mora, are often linked 
to bet-hedging strategies m with applications ranging 
from phenotypic stability f62j, competition of individuals 
for resources [63], ecology models of population regula¬ 
tion [61], neural coding m to portfolio risk manage¬ 
ment |65|. Of course discrete designs of components of 
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naturally occurring biological systems are very common. 
For example, in neuroscience both the visual system [66] 
and the olfactory system m use discrete receptors to re¬ 
spond to continuous or quasi-continuous inputs, whereas 
the existence of species is responsible for the interest of 
ecologists in this topic. The retina system has been char¬ 
acterized in great detail experimentally [68] and theoret¬ 
ical detailed predictions have been put forward based on 
the idea of efficient coding , which hypothesizes that sig¬ 
nal processing has evolved to optimally encode natural 
stimulus while minimizing resources, to the extent that 
it is becoming possible to test these predictions against 
existing the neural circuitry [5M72] . 

Lastly, we can ask what is the link between these op¬ 
timal solutions, which are obtained in a static setting, to 
real dynamical biological systems. How is the structure of 
non-overlapping genes and well separated receptors that 
nevertheless manage to completely cover space achieved 
in terms of the natural dynamics of the systems? As was 
explicitly shown in m a simple birth-death dynamics of 
receptor clones where receptors compete for interactions 
with antigens results in exactly the same optimal reper¬ 
toire distributions as discussed above. There, mutual 
competitive exclusion is responsible for the characteris¬ 
tic tiling structure. In genes, the concentration ranges in 
which genes function are also mutually exclusive, as evi¬ 
denced by their distinct dissociation constants. In our in¬ 
formation optimization setup, the genes do not interact. 
Yet the system achieved this solution by means of long 
term evolution, which favored this non-overlapping state. 


In fact, allowing two output genes to interact results in 
an optimal network where the gene with a higher dissoci¬ 
ation constant represses the expression of the gene with a 
smaller dissociation constant. This further restricts the 
range of activity of the gene with a lower dissociation 
constant, since it gets turned completely off beyond a cer¬ 
tain concentration. In this interacting gene example the 
exclusion is encoded both in the dissociation constants 
and the direct negative regulation. In general evolution 
can use either one of these methods to arrive at non¬ 
overlapping concentration ranges where these genes func¬ 
tion, hence tiling solutions. Negative feedback or compe¬ 
tition is essential for the dynamics to reach these locally 
clustered, globally tiled solutions also in other systems. 
Such behavior has been termed competitive exclusion in 
ecology m and evoked as a reason for the emergence of 
species, as well as lateral inhibition in neuroscience [74]. 
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